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DNA SEQUENCING BY PARALLEL 
OLIGONUCLEOTIDE EXTENSIONS 

5 This application is a continuation of Serial No. 08/872,446, filed June 

10, 1997, which is a divisional of Serial No. 08/424,663, filed April 17, 1995 , 

herein incorporated by reference. 

10 

Field of the Invention 
The invention relates generally to methods for determining die nucleotide 
sequence of a polynucleotide, and more particularly, to a method of identifying 
15 nucleotides in a template by stepwise extension of one or more primers by 
successive ligations of oligonucleotide blocks. 

BACKGROUND 

Analysis of polynucleotides with currently available techniques provides a 
20 spectrum of information ranging from the confirmation mat a test polynucleotide is 
the same or different than a standard or an isolated fragment to the express 
identification and ordering of each nucleoside of the test polynucleotide. Not only 
are such techniques crucial for understanding the function and control of genes and 
for applying many of the basic techniques of molecular biology, but they have also 
25 become increasingly important as tools in genomic analysis and a great many non- 
research applications, such as genetic identification, forensic analysis, genetic 
counselling, medical diagnostics, and the like. In these latter applications both 
techniques providing partial sequence information, such as fingerprinting and 
sequence comparisons, and techniques providing full sequence determination have 
30 been employed, e.g. Gibbs et al, Proc. Natl. Acad. Sci., 86: 1919-1923 (1989); 
Gyllensten et al, Proc. Natl. Acad. Sci, 85: 7652-7656 (1988); Carrano et al, 
Genomics, 4:129-136 (l989);Caetano-Anolles et al, Mol. Gen. Genet,, 235: 157- 
165 (1992); Brenner and Livak, Proc. Natl. Acad. Sci., 86: 8902-8906 (1989); 
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Green et al, PCR Methods and Applications, i: 77-90 (1991); and Versalovic et al, 
Nucleic Acids Research, 19: 6823-6831 (1991). 

Native DNA consists of two linear polymers, or strands of nucleotides. 
Each strand is a chain of nucleosides linked by phosphodiester bonds. The two 
5 strands are held together in an antiparallel orientation by hydrogen bonds between 
complementary bases of the nucleotides of the two strands: deoxyadenosine (A) 
pairs with thymidine (T) and deoxyguanosine (G) pairs with deoxycytidine (C). 

Presently there are two basic approaches to DNA sequence determination: 
the dideoxy chain termination method, e.g. Sanger et al, Proc. Natl. Acad. Sci. T 74: 

10 5463-5467 (1977); and the chemical degradation method, e.g. Maxam et al, Proc. 
Nad. Acad. Sci.. 74: 560-564 (1977). The chain termination method has been 
improved in several ways, and serves as the basis for all currently available 
automated DNA sequencing machines, e.g. Sanger et al, J. Mol Biol . 143: 161- 
178 (1980); Schreier et al, J. Mol. Biol. . 129: 169-172 (1979); Smith et al, Nucleic 

15 Acids Research. 13: 2399-2412 (1985); Smith et al, Nature - 321: 674-679 (1987); 
Prober et al, Science . 238: 336-341 (1987); Section H, Meth. Enzymol .. 155: 51- 
334 (1987); Church et al, Science . 240: 185-188 (1988); Hunkapiller et al, 
Science, 254: 59-67 (1991); Bevan et al, PCR Methods and Applications, 1: 222- 
228 (1992). 

20 Both the chain termination and chemical degradation methods require the 

generation of one or more sets of labeled DNA fragments, each having a common 
origin and each terminating with a known base. The set or sets of fragments must 
then be separated by size to obtain sequence information. In both methods, the 
DNA fragments are separated by high resolution gel electrophoresis, which must 

25 have the capacity of distinguishing very large fragments differing in size by no 
more than a single nucleotide. Unfortunately, this step severely limits the size of 
the DNA chain that can be sequenced at one time. Sequencing using these 
techniques can reliably accommodate a DNA chain of up to about 400-450 
nucleotides, Bankier et al, Meth. Enzymol. . 155: 51-93 (1987); and Hawkins et al, 

30 Electrophoresis, 13: 552-559 (1992). 

Several significant technical problems have seriously impeded the 
application of such techniques to the sequencing of long target polynucleotides, 
e.g. in excess of 500-600 nucleotides, or to the sequencing of high volumes of 
many target polynucleotides. Such problems include i) the gel electrophoretic 

35 separation step which is labor intensive, is difficult to automate, and introduces an 
extra degree of variability in the analysis of data, e.g. band broadening due to 
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temperature effects, compressions due to secondary structure in the DNA 
sequencing fragments, inhomogeneities in the separation gel, and the like; ii) 
nucleic acid polymerases whose properties, such as processivity, fidelity, rate of 
polymerization, rate of incorporation of chain terminators, and the like, are often 
5 sequence dependent; iii) detection and analysis of DNA sequencing fragments 
which are typically present in fmol quantities in spacially overlapping bands in a 
gel; iv) lower signals because the labelling moiety is distributed over the many 
hundred spacially separated bands rather than being concentrated in a single 
homogeneous phase, and v) in the case of single-lane fluorescence detection, the 

10 availability of dyes with suitable emission and absorption properties, quantum 

yield, and spectral resolvability, e.g. Trainor, Anal. Biochem., 62: 418-426 (1990); 
Connell et al, Biotechniques, 5: 342-348 (1987); Karger et al, Nucleic Acids 
Research, 19: 4955-4962 (1991); Fung et al, U.S. patent 4,855,225; and 
Nishikawaet al, Electrophoresis, 12: 623-631 (1991). 

15 Another problem exists with current technology in the area of diagnostic 

sequencing. An ever widening array of disorders, susceptibilities to disorders, 
prognoses of disease conditions, and the like, have been correlated with the 
presence of particular DNA sequences, or the degree of variation (or mutation) in 
DNA sequences, at one or more genetic loci. Examples of such phenomena 

20 include human leukocyte antigen (HLA) typing, cystic fibrosis, tumor progression 
and heterogeneity, p53 proto-oncogene mutations, ras proto-oncogene mutations, 
and the like, e.g. Gyllensten et al, PGR Methods and Applications, 1: 91-98 
(1991); Santamaria et al, International application PCT/US92/01675; Tsui et al, 
International application PCT/CA90/00267; and the like. A difficulty in 

25 determining DNA sequences associated with such conditions to obtain diagnostic 
or prognostic information is the frequent presence of multiple subpopulations of 
DNA, e.g. allelic variants, multiple mutant forms, and the like. Distinguishing the 
presence and identity of multiple sequences, with current sequencing technology is 
virtually impossible, without additional work to isolate and perhaps clone the 

30 separate species of DNA. 

A major advance in sequencing technology could be made if an alternative 
approach was available for sequencing DNA that did not required high resolution 
electrophoretic separations of DNA fragments, that generated signals more 
amenable to analysis, and that provided a means for readily analyzing DNA from 

35 heterozygous genetic loci. 
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An objective of the invention is to provide such an alternative approach to 
presently available DNA sequencing technologies. 

fc Summary of the Invention 
5 The invention provides a method of nucleic acid sequence analysis based 

on repeated cycles of duplex extension along a single stranded template. 
Preferably, such extension starts from a duplex formed between an initializing 
oligonucleotide and the template. The initializing oligonucleotide is extended in an 
initial extension cycle by ligating an oligonucleotide probe to its end to form an 

10 extended duplex. The extended duplex is then repeatedly extended by subsequent 
cycles of ligation. During each cycle, the identity of one or more nucleotides in the 
template is determined by a label on, or associated with, a successfully ligated 
oligonucleotide probe. Preferably, the oligonucleotide probe has a blocking 
moiety, e.g. a chain-terminating nucleotide, in a terminal position so that only a 

15 single extension of the extended duplex takes place in a single cycle. The duplex is 
further extended in subsequent cycles by removing the blocking moiety and 
regenerating an extendable terminus. 

In one aspect of the invention, a plurality of different initializing 
oligonucleotides is provided for separate samples of the template. Each initializing 

20 oligonucleotide forms a duplex with the template such that the end undergoing 
extension is one or more nucleotides out of register, or phase, with that every 
other initializing oligonucleotide of the plurality. In other words, the starting 
nucleotide for extension is different by one or more nucleotides for each of the 
different initializing oligonucleotides. In this manner, after each cycle of extension 

25 with oligonucleotide probes of the same length, the same relative phase exists 
between the ends of the initializing oligonucleotides on the different templates. 
Thus, in a preferred embodiment, where, for example, i) the initializiang 
oligonucleotides are out of phase by one nucleotide, ii) 9-mer oligonucleotide 
probes are used in the extension step, and iii) nine different initializing 

30 oligonucleotides are employed, nine template nucleotides will be identifed 
simultaneously in each extension cycle. 

Brief Description of the Drawings 
Figure 1 diagrammatically illustrates parallel extensions of multiple 
35 templates in accordance with the invention. 
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Figure 2 diagrammatically illustrates an embodiment of the invention 
employing acid-labile linkages. 

Figure 3 A diagrammatically illustrates an embodiment of the invention 
employing RNase H labile oligonucleotides with 3'-»5 r extensions. 

Figure 3B diagrammatically illustrates an embodiment of the invention 
employing RNase H labile oligonucleotides with 5'->y extensions. 

Figure 4 diagrammatically illustrates an embodiment of the invention 
employing ligation followed by polymerase extension and cleavage. 



Definitions 

As used herein "sequence determination/ "determining a nucleotide 
sequence," "sequencing," and like terms, in reference to polynucleotides includes 
determination of partial as well as full sequence information of the polynucleotide. 

15 That is, the term includes sequence comparisons, fingerprinting, and like levels of 
information about a target polynucleotide, as well as the express identification and 
ordering of each nucleoside of the test polynucleotide. 

"Perfectly matched duplex" in reference to the protruding strands of probes 
and target polynucleotides means that the protruding strand from one forms a 

20 double stranded structure with the other such that each nucleotide in the double 
stranded structure undergoes Watson-Crick basepairing with a nucleotide on the 
opposite strand. The term also comprehends the pairing of nucleoside analogs, 
such as deoxyinosine, nucleosides with 2-aminopurine bases, and the like, that may 
be employed to reduce the degeneracy of the the probes. 

25 The term "oligonucleotide" as used herein includes linear oligomers of 

nucleosides or analogs thereof, including deoxyribonucleosides, ribonucleosides, 
and the like, Usually oligonucleotides range in size from a few monomeric units, 
e.g. 3-4, to several hundreds of monomeric units. Whenever an oligonucleotide is 
represented by a sequence of letters, such as " ATGCCTG," it will be understood 

30 that the nucleotides are in 5 ->3' order from left to right and that "A" denotes 

deoxyadenosine, "C" denotes deoxycytidine, "G" denotes deoxyguanosine, and "T" 
denotes thymidine, unless otherwise noted. 

As used herein, "nucleoside" includes the natural nucleosides, including 2 - 
deoxy and 2-hydroxyl forms, e.g. as described in Romberg and Baker, DNA 

35 Replication, 2nd Ed. (Freeman, San Francisco, 1992). "Analogs" in reference to 
nucleosides includes synthetic nucleosides having modified base moieties and/or 
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modified sugar moieties, e.g. described generally by Scheie Nucleotide Analogs 
(John Wiley, New York, 1980). Such analogs include synthetic nucleosides 
designed to enhance binding properties, reduce degeneracy, increase specificity, 
and the like. 

5 As used herein, "ligation" means to form a covalent bond or linkage . 

between the termini of two or more nucleic acids, e.g. oligonucleotides and/or 
polynucleotides, in a template-driven reaction. The nature of the bond or linkage 
may vary widely and the ligation may be carried out enzymatically or chemically. 

10 Dented Pesgpptiw of &<? Invention 

The invention provides a method of sequencing nucleic acids which 
obviates electrophoretic separation of similarly sized DNA fragments, and which 
eliminates the difficulties associated with the detection and analysis of spatially 
overlapping bands of DNA fragments in a gel or like medium. The invention also 

15 obviates the need to generate DNA fragments from long single stranded templates 
with a DNA polymerase. 

The general scheme of one aspect of the invention is shown 
diagraramadcally in Figure 1. As described more fully below, the invention is not 
meant to be limited by the particular features of this embodiment Template (20) 

20 comprising a polynucleotide (50) of unknown sequence and binding region (40) is 
attached to solid phase support (10). Preferably, for embodiments employing N- 
mer probes, the template is divided into N aliquots, and for each aliquot a different 
initializing oligonucleotide % is provided that forms a perfectly matched duplex at 
a location in binding region (40) different from that of the other initializing 

25 oligonucleotides. That is, the initializing oligonucleotides \\-y$ form a set of 
duplexes with the template in the binding region (40), such that the ends of the 
duplexes proximal to the unknown sequence are from 0 to N-l nucleotides from 
the start the unknown sequence. Thus, in the first cycle of ligations with N-mer 
probes, a terminal nucleotide (16) of probe (30) ligated to ii in Figure 1 will be 

30 complementary to the N-l nucleotide of binding region (40). Likewise, a terminal 
nucleotide (17) of probe (30) ligated to 12 in Figure 1 will be complementary to the 
N-2 nucleotide of binding region (40); a terminal nucleotide (18) of probe (30) 
ligated to i3 in Figure 1 will be complementary to the N-3 nucleotide of binding 
region (40), and so on. Finally, a terminal nucleotide (15) of probe (30) ligated to 

35 i n in will be complementary to the first nucleotide of unknown sequence (50). In 
the second cycle of ligations, a terminal nucleotide (19) of probe (31) will be 
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complementary to the second nucleotide (19) of unknown sequence (50) in 
duplexes starting with initializing oligonucleotide iji . Likewise, terminal 
nucleotides of probes iigated to duplexes starting with initializing oligonucleotides 
*2> *3> 14. and so on, will be complementary to the third, fourth, and fifth 
5 nucleotides of unknown sequence (50). 

In the above embodiment, the oligonucleotide probes are labeled so that 
the identity of the nucleotide abutting the extended duplex can be determined from 
the label. 

Binding region (40) has a known sequence, but can vary greatly in length 

10 and composition. It must be sufficiently long to accommodate the hybridization of 
an initializing oligonucleotide. Different binding regions can be employed with 
either identical or different initializing oligonucleotides, but for convenience of 
preparation, it is preferable to provide identical binding regions and different 
initializing oligonucleotides. Thus, all the templates are prepared identically and 

15 then separated into aliquots for use with different initializing oligonucleotides. 
Preferably, the binding region should be long enough to accommodate a set of 
different initializing oligonucleotides, each hydridizing to the template to produce a 
different starting point for subsequent ligations. Most preferably, the binding 
region is between about 20 to 50 nucleotides in length. 

20 Initializing oligonucleotides are selected to form highly stable duplexes 

with the binding region that remain intact during any washing steps of the 
extension cycles. This is conveniently achieved by selecting the length(s) of the 
initializing oligonucleotides to be considerably longer than that, or those, of the 
oligonucleotide probes and/or by selecting them to be GC-rich. Initializing 

25 oligonucleotides may also be/6© cross-linked to the template strand by a variety of 
techniques, e.g. Sumraerton et al, U.S. patent 4,123,610; or they may be 
comprised of nucleotide analogs that form duplexes of greater stability than their 
natural counterparts, e.g. peptide nucleic acids, Science, 254:1497-1500 (1991); 
Hanvey et al, Science, 258: 1481-1485 (1992); and PCT applications 

30 PCT/EP92/012 19 and PCT/EP92/01220. 

Preferably, the length of the initializing oligonucleotide is from about 20 to 
30 nucleotides and its composition comprises a sufficient percentage of G's and Cs 
to provide a duplex melting temperature that exceeds those of the oligonucleotide 
probes being employed by about 10-50°C. More preferably, the duplex melting 

35 temperature of the initializing oligonucleotide exceeds those of the oligonucleotide 
probes by about 20-50°C. The number, N, of distinct initializing oligonucleotides . 
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employed in a sequencing operation can vary from one, in the case where a single 
nucleotide is identified at each cycle, to a plurality whose size is limited only by the 
size of oligonucleotide probe that can be practically employed. Factors limiting the 
size of the oligonucleotide probe include the difficulty in preparing mixtures having 

5 sufficiently high concentrations of individual probes to drive hybridization 
reactions at a reasonable rate, the susceptibility of longer probes to forming 
secondary structures, reduction in sensitivity to single base mismatches, and the 
like. Preferably, N is in the range of from 1 to 16; more preferably, N is in the 
range of from 1 to 12; and most preferably, N is in the range of from 1 to 8. 

10 A wide variety of oligonucleotide probes can be used with the invention. 

Generally, the oligonucleotide probes should be capable of being ligated to an 
initializing oligonucleotide or extended duplex to generate the extended duplex of 
the next extension cycle; the ligation should be template-driven in that the probe 
should form a duplex with the template prior to ligation; the probe should possess a 

15 blocking moiety to prevent multiple probe ligations on the same template in a single 
extension cycle, the probe should be capable of being treated or modified to 
regenerate an extendable end after ligation, and the probe should possess a signaling 
moiety that permits the acquisition of sequence information relating to the template 
after a successful ligation. As described more Mly below, depending on the 

20 embodiment, the extended duplex or initializing oligonucleotide may be extended in 
either the 5*-*y direction or the 3'-»5' direction by oligonucleotide probes. 
Generally, the oligonucleotide probe need not form a perfecdy matched duplex with 
the template, although such binding is usually preferred. In preferred embodiments 
in which a single nucleotide in the template is identified in each extension cycle, 

25 perfect base pairing is only required for identifying that particular nucleotide. For . 
example, in embodiments where the oligonucleotide probe is enzymatically ligated 
to an extended duplex, perfect base pairing--i.e. proper Watson-Crick base pairing- 
is required between the terminal nucleotide of the probe which is ligated and its 
complement in the template. Generally, in such embodiments, the rest of the 

30 nucleotides of the probe serve as "spacers" that ensure the next ligation will take 
place at a predetermined site, or number of bases, along the template. That is, their 
pairing, or lack thereof, does not provide further sequence information. Likewise, 
in embodiments that rely on polymerase extension for base identification, the probe 
primarily serves as a spacer, so specific hybridization to the template is not critical, 

35 although it is* desirable. 
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Preferably, the oligonucleotide probes are applied to templates as mixtures 
comprising oligonucleotides of all possible sequences of a predetermined length. 
The complexity of such mixtures can be reduced by a number of methods, 
including using so-called degeneracy-reducing analogs, such as deoxyinosine and 
5 the like, e.g. as taught by Kong Thoo Lin et al, Nucleic Acids Research, 20: 5 149- 
5152; U.S. patent 5,002,867; Nichols et al, Nature, 369: 492-493 (1994); or by 
separately applying multiple mixtures of oligonucleotide probes, e.g. four mixtures 
comprising four disjoint subsets of oligonucleotide sequences that taken together 
would comprise all possible sequences of the predetermined length. 

10 Initializing oligonucleotides and oligonucleotide probes of the invention are 

conveniently synthesized on an automated DNA synthesizer, e.g. an Applied 
Biosystems, Inc. (Foster City, California) model 392 or 394 DNA/RNA 
Synthesizer, using standard chemistries, such as phosphoramidite chemistry, e.g. 
disclosed in the following references: Beaucage and Iyer, Tetrahedron, 48: 2223- 

15 23 1 1 ( 1992); Molko et al, U.S. patent 4,980,460; Koster et al, U.S. patent 

4,725,677; Caruthers et al, U.S. patents 4,415,732; 4,458,066; and 4,973,679; and 
the like. Alternative chemistries, e.g. resulting in non-natural backbone groups, 
such as phosphorothioate, phosphoramidate, and the like, may also be employed 
provided that the resulting oligonucleotides are compatible with the ligation and 

20 other reagents of a particular embodiment. Mixtures of oligonucleotide probes are 
readily synthesized using well known techniques, e.g. as disclosed in Telenius et al, 
Genomics, 13: 718-725 (1992); Welsh et al, Nucleic Acids Research, 19: 5275- 
5279 (1991); Grothues et al, Nucleic Acids Research, 21: 1321-1322 (1993); 
Hartley, European patent application 90304496.4; and the like. Generally, these 

25 techniques simply call for the application of mixtures of the activated monomers to 
the growing oligonucleotide during the coupling steps where one desires to 
introduce the degeneracy. 

When conventional ligases are employed in the invention, as described 
more fully below, the 5* end of the probe may be phosphorylated in some 

30 embodiments. A 5* monophosphate can be attached to an oligonucleotide either 
chemically or enzymatically with a kinase, e.g. Sambrook et al, Molecular Cloning: 
A Laboratory Manual, 2nd Edition (Cold Spring Harbor Laboratory, New York, 
1989). Chemical phosphorylation is described by Horn and Urdea, Tetrahedron 
Lett, 27: 4705 (1986), and reagents for carrying out the disclosed protocols are 

35 commercially available, e.g. 5* Phosphate-ON™ from Clontech Laboratories 
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(Palo Alto, California). Preferably, when required, oligonucleotide probes are 
chemically phosphorylated. 

The probes of the invention can be labeled in a variety of ways, including 
the direct or indirect attachment of fluorescent moieties, colorimetric moieties, and 

5 the like. Many comprehensive reviews of methodologies for labeling DNA and 
constructing DNA probes provide guidance applicable to constructing probes of 
the present invention. Such reviews include Matthews et al, Anal. Biochem.. Vol 
169, pgs. 1-25 (1988); Haugland, Handbook of Fluorescent Probes and Research 
Chemicals (Molecular Probes, Inc., Eugene, 1992); Keller and Manak, DNA - 

10 Probes, 2nd Edition (Stockton Press, New York, 1993); and Eckstein, editor, 
Oligonucleotides and Analogues: A Practical Approach (IRL Press, Oxford, 
1991); and the like. Many more particular methodologies applicable to the 
invention are disclosed in the following sample of references: Fung et al, U.S. 
patent 4,757,141; Hobbs, Jr., et al U.S. patent 5,151,507; Cruickshank, U.S. 

15 patent 5,09 1,5 19; (synthesis of functionalized oligonucleotides for attachment of 
reporter groups); Jablonski et al, Nucleic Acids Research, 14: 6115-6128 
(1986)(enzyme-oligonucleotide conjugates); and Urdea et al, U.S. patent 
5,124,246 (branched DNA). 

Preferably, the probes are labeled with one or more fluorescent dyes, e.g. 

20 as disclosed by Menchen et al, U.S. patent 5,188,934; Begot et al PCT application 
PCT/US90/05565. 

Guidance in selecting hybridization conditions for the application of 
oligonucleotide probes to templates can be found in numerous references, e.g. 
Wetmur, Critical Reviews in Biochemistry and Molecular Biology, 26: 227-259 

25 (1991); Dove and Davidson, J. Mol. Biol. 5: 467-478 (1962); Hutton, Nucleic 
Acids Research, 10: 3537-3555 (1977); Breslauer et al, Proc. Natl. Acad. ScL 83: 
3746-3750 (1986); Innis et al, editors, PCR Protocols (Academic Press, New 
York, 1990); and the like. 

Generally, when an oligonucleotide probe anneals to a template in 

30 juxtaposition to an end of the extended duplex, the duplex and probe are ligated, 
Le. are caused to be covalently linked to one another. Ligation can be 
accomplished either enzymatically or chemically. Chemical ligation methods are 
well known in the art, e.g. Ferris et al, Nucleosides & Nucleotides, 8: 407-414 
(1989); Shabarova et al, Nucleic Acids Research, 19: 4247-4251 (1991); and the 

35 like. Preferably, enzymatic ligation is carried out using a ligase in a standard 
protocol Many ligases are known and are suitable for use in the invention, e.g. 
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Lehman, Science, 186: 790-797 (1974); Engler et al, DNA Ligases, pages 3-30 in 
Boyer, editor, The Enzymes, Vol 15B (Academic Press, New York, 1982); and 
the like. Preferred ligases include T4 DNA ligase, T7 DNA ligase, E. coli DNA 
ligase, Taq ligase, Pfu ligase, and Tth ligase. Protocols for their use are well 
5 known, e.g. Sambrook et al (cited above); Barany, PCR Methods and 

Applications, 1: 5-16 (1991); Marsh et al, Strategies, 5: 73-76 (1992); and the like. 
Generally, ligases require that a 5' phosphate group be present for ligation to the 3' 
hydroxyl of an abutting strand. 

io Preparing Tygst Polynucleotides 

Preferably, a target polynucleotide is conjugated to a binding region to 
form a template, and the template is attached to a solid phase support, such as a 
magnetic particle, polymeric microsphere, filter material, or the like, which permits 
the sequential application of reagents without complicated and time-consuming 

15 purification steps. The length of the target polynucleotide can vary widely; 
however, for convenience of preparation, lengths employed in conventional 
sequencing are preferred. For example, lengths in the range of a few hundred 
basepairs, 200-300, to 1 to 2 kilobase pairs are preferred. 

The target polynucleotides can be prepared by various conventional 

20 methods. For example, target polynucleotides can be prepared as inserts of any of 
the conventional cloning vectors, including those used in conventional DNA 
sequencing. Extensive guidance for selecting and using appropriate cloning 
vectors is found in Sambrook et al, Molecular Cloning: A Laboratory Manual, 
Second Edition (Cold Spring Harbor Laboratory, New York, 1989), and like 

25 references. Sambrook et al and Innis et al, editors, PCR Protocols (Academic 
Press, New York, 1990) also provide guidance for using polymerase chain 
reactions to prepare target polynucleotides. Preferably, cloned or PCR-amplified 
target polynucleotides are prepared which permit attachment to magnetic beads, or 
other solid supports, for ease of separating the target polynucleotide from other 

30 reagents used in the method. Protocols for such preparative techniques are 

described fully in Wahlberg et al, Electrophoresis, 13: 547-55 1 (1992); Tong et al, 
Anal. Chem., 64: 2672-2677 (1992); Hultman et al, Nucleic Acids Research, 17: 
4937-4946 (1989); Hultman et al, Biotechniques, 10: 84-93 (1991); Syvanen et al, 
Nucleic Acids Research, 16: 11327-11338 (1988); Dattagupta et al, U.S. patent 

35 4,734,363; Uhlen, PCT application PCT/GB 89/00304; and like references. Kits 
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are also commercially available for practicing such methods, e.g. Dynabeads™ 
template preparation kit from Dynal AS. (Oslo, Norway). 

Generally, the size and shape of a microparticle or beads employed in the 
method of the invention is not critical; however, microparticles in the size range of 
a few, e.g. 1-2, to several hundred, e.g. 200-1000 [tm diameter are preferable, as 
they minimize reagent and sample usage while permitting the generation of readily 
detectable signals, e.g. from fluorescently labeled probes. 

Schemes forLiearing. Canning, and 
Regenerati ng Extendable Termini 

In one aspect, the invention calls for repeated steps of ligating and 
identifying of oligonucleotide probes. However, since the ligation of multiple 
probes to the same extended duplex in the same step would usually introduce 
identification problems, it is useful to prevent multiple extensions and to regenerate 
extendable termini. Moreover, if the ligation step is not 100% efficient, it would 
be desirable to cap extended duplexes that fail to undergo ligation so that they do 
not participate in any further ligation steps. That is, a capping step preferably 
occurs after a ligation step, by analogy with other synthetic chemical processes, 
such as polynucleotide synthesis, e.g. Andrus et al, U.S. patent 4.816,571. This 
would remove a potentially significant source of noise from signals generated in 
subsequent identification steps. 

Below, several exemplary schemes for carrying out ligation, capping, 
regeneration, and identification steps in accordance with the invention are 
described. They are presented for purposes of guidance and are not meant to be 
limiting. 

A scheme for extending an initializing oligonucleotide or an extended 
duplex in the 3'-*5' direction is illustrated in Figure 2. Template (20) is attached 
to solid phase support (10) by its 5' end. This can be conveniendy accomplished 
via a biotin, or like linking moiety, using conventional techniques. Initializing 
oligonucleotide (200) having a 5' phosphate group is annealed to template (20) as 
described above prior to the initial cycle of ligation and identification. An 
oligonucleotide probe (202) of the following form is employed: 

HO-(3')BBB ... BBB(5')-OP(=0)(0-)NH-B t * 



.12- 



c 



where BBB ... BBB represents the sequence of nucleotides of oligonucleotide 
probe (202) and B t * is a labeled chain-terminating moiety linked to the 5' carbon 
of the oligonucleotide via a phosphoramidate group, or other labile linkage, such 
as a photocleavable linkage. The nature of B t * may vary widely. It can be a 
5 labeled nucleoside (e.g. coupled via a S'P-^'N phosphoramidate) or other moiety, 
so long as it prevents successive ligations. It may simply be a label connected by a 
linker, such as described in Agrawal and Tang, International application* 
PCT/US9 1/08347. An important feature of the oligonucleotide probe is that after 
annealing and ligation (204), the label may be removed and the extendable end 

10 regenerated by treating the phosphoramidate linkage with acid, e.g. as taught by 
Letsinger et al, J. Am. Chera. Soc, 94: 292-293 (1971); Letsinger et al, Biochera., 
15: 2810-2816 (1976); Gryaznov et al, Nucleic Acid Research, 20: 3403-3409 
(1992); and like references. By way of example, hydrolysis of the 
phosphoramidate may be accomplished by treatment with 0.8% trifluoroacetic acid 

15 in dichloromethane for 40 minutes at room temperature. Thus, after annealing, 
ligating, and identifying the ligated probe via the label on B t *, the chain- 
terminating moiety is cleaved by acid hydrolysis (206) thereby breaking the 
phosphorus linkage and leaving a 5' monophosphate on the ligated oligonucleotide. 
The steps can be repeated (208) in successive cycles. In one aspect of this . 

20 embodiment, a single initializing oligonucleotide may be employed such that only 
one nucleotide is identified in each sequencing cycle. For such an embodiment, the 
above probe preferably has the following form: 



25 



30 



HO-(3 , )B(5 l )-OP(=:0)(0")NHBB ... BBB-B t * 

Thus, after each ligation and acid cleavage steps the duplex will be extended by 
one nucleotide. 

A capping step may be introduced prior to hydrolysis. For example, probe 
(202) may have the form: 

HO-(3*)BB ... Bp A B ... BB(5>OP(=0)(0-)NH-B t * 



where "p A " is a exonuclease resistant linkage, such as phosphorothioate, 
methylphosphonate, or the like. In such an embodiment, capping can be achieved 
35 by treating the extended duplexes with an exonuclease, such as X exonuclease, 
which will cleave the unligated extended duplexes back to the exonuclease 
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resistant linkage. The presence of this linkage at the 5' end of the extended duplex 
will then prevent it from participating in subsequent ligations. Clearly, many other 
capping methodologies may be employed, e.g. acylation, ligation of an inert 
oligonucleotide, or the like. When free 3 r hydroxyls are involved, capping may be 
5 accomplished by extending the duplex with a DNA polymerase in the presence of 
chain-terminating nucleoside triphosphates, e.g. dideoxynucleoside triphosphates, 
or the like. 

The phosphoramidate linkage described above is an example of a general 
class of internucleosidic linkages referred to herein as "chemically scissile 

10 internucleosidic linkages." These are internucleosidic linkages that may be cleaved 
by treating them with characteristic chemical or physical conditions, such as an 
oxidizing environment, a reducing environment, light of a characteristic 
wavelength (for photolabile linkages), or the like. Other examples of chemically 
scissile internucleosidic linkages which may be used in accordance with the 

15 invention are described in Urdea 5,380,833; Gryaznov et al, Nucleic Acids 
Research, 21: 1403-1408 (1993)(disulfide); Gryaznov et al, Nucleic Acids 
Research, 22: 2366-2369 (1994)(bromoacetyl); Urdea et al, International 
application PCT/US9 1/05287 (photolabile); and like references. 

Further chemically scissile linkages that may be employed with the 

20 invention include chain-terminating nucleotides that may be chemically converted 
into an extendable nucleoside. Examples of such compounds are described in the 
following references: Canard et al, Internation application PCT/FR94/00345; 
Ansorge, German patent application No. DE 4141178 Al; Metzker et al, Nucleic 
Acids Research, 22: 4259-4267 (1994); Cheeseman, U.S.-patent 5,302,509; Ross 

25 et al, International application PCT/US90/06178; and the like. 

A scheme for extending an initializing oligonucleotide or an extended 
duplex in the 5*-*3* direction is illustrated in Figure 3A. Template (20) is attached 
to solid phase support (10) by its 3' end. As above, this can be conveniendy 
accomplished via a biotin, or like linking moiety, using conventional techniques. 

30 Initializing oligonucleotide (300) having a 3' hydroxyl group is annealed to 
template (20) as described above prior to the initial cycle of ligation and 
identification. An oligonucleotide probe (302) of the following form is employed: 

OP(=0)(0-)0-(5 r )BBB ... BBBRRRRB t * 

35 
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where BBB ... BBBRRRR represents the sequence of 2-deoxynucleotides of 
oligonucleotide probe (302), "RRRR" represent a sequence of four ribonucleotides 
of probe <302), and B t * is a labeled chain-terminating moiety, as described.above. 
Such mixed RNA-DNA oligonucleotides are readily synthesized using 
5 conventional automated DNA synthesizers, e.g. Duck et al, U.S. patent 5,01 1,769. 
RNase H will cleave the probe specifically in the center of the four ribonucleotide 
segment, Hogrefe et al, J. Biol Chem., 265: 5561-5566 (1990), leaving a 3* 
hydroxyl (312) on the extended duplex, which may participate in subsequent 
ligation steps. Thus, a cycle in the present embodiment proceeds by annealing 

10 probe (302) to template (20) and ligating (304) to form extended duplex (306). 
After identification via B t *, the extended duplex is treated with RNase H to cleave 
the label and regenerate an extendable end. The cycle is then repeated (3 14). 
Capping (310) can be carried out prior to RNase H treatment by extending the 
unligated ends with a DNA polymerase in the presence of the four 

15 dideoxynucleoside triphosphates, ddATP, ddCTP, ddGTP, and ddTTR 

As illustrated in Figure 3B, a similar scheme can be employed for 3'-»5' 
extensions. In such an embodiment, initiating oligonucleotide or extended duplex 
(330) has a 5' monophosphate and the oligonucleotide probe (332) has the form: 

20 HO-(3')BBB ... BBBRRRRB .. BB t * 

As above, after annealing, ligating (334), and identifying (338), extended duplex 
(336) is cleaved by RNase H which in this case leaves a 5' monophosphate (342) at 
the terminus of the extended duplex. With the regenerated extendable end, the 

25 cycle can be repeated (344). A capping step can be included prior to RNase H 
hydrolysis by either ligating an unlabeled non-RNA-containing probe, or by 
removing any remaining 5 9 monophosphates by treatment with a phosphatase. 

Identification of nucleotides can be accomplished by polymerase extension 
following ligation. As exemplified in Figure 4, for this embodiment, template (20) 

30 is attached to solid phase support (10) as described above and initializing 

oligonucleotide (400) having a 3' hydroxyl is annealed to the template prior to the 
initial cycle. Oligonucleotide probes (402) of the form: 

OP(=0)(0-)0-(5 f )BBB ... BBBRRRRB ... B(3')OP(=0)(()-)0 

35 
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are annealed to template (20) and ligated (404) to form extended duplex (406). 
The 3' monophosphate, which prevents successive ligations of probes in the same 
cycle, is removed with phosphatase (408) to expose a free 3 1 hydroxyl (410). 
Clearly, alternative blocking approaches may also be used. Extended duplex (406) 
5 is further extended by a nucleic acid polymerase in the presence of labeled 

dideoxynucleoside triphosphates (412), thereby permitting the identification of a 
nucleotide of template (20) by the label of the incorporated dideoxynucleotide. 
The labeled dideoxynucleotide and a portion of probe (402) are then cleaved 
(414), for example, by RNase H treatment, to regenerate an extendable end on 

10 extended duplex (406). The cycle is then repeated (416). 

In order to reduce the number of separate annealing reactions that must be 
carried out, the oligonucleotide probes may be grouped into mixtures, or subsets, 
of probes whose perfectly matched duplexes with complementary sequences have 
similar stability or free energy of binding. Such subsets of oligonucleotide probes 

15 having similar duplex stability are referred to herein as "stringency classes" of 
oligonucleotide probes. The mixtures, or stringency classes, of oligonucleotide 
probes are then separately combined with the target polynucleotide under 
conditions such that substantially only oligonucleotide probes complementary to 
the target polynucleotide form duplexes. That is, the stringency of the 

20 hybridization reaction is selected so that substantially only perfectly 

complementary oligonucleotide probes form duplexes. These perfectly matched 
duplexes are then ligated to form extended duplexes. For a given oligonucleotide 
probe length, the number of oligonucleotide probes within each stringency class 
can vary widely. Selection of oligonucleotide probe length and stringency class 

25 size depends on several factors, such as length of target sequence and how it is 
prepared, the extent to which the hybridization reactions can be automated, the 
degree to which the stringency of the hybridization reaction can be controlled, the 
presence or absence of oligonucleotide probes with complementary sequences, and 
the like. Guidance in selecting an appropriate size of stringency class for a 

30 particular embodiment can be found in the general literature on nucleic acid 
hybridization and polymerase chain reaction methodolo'gy, e.g. Gotoh, Adv. 
Biophys. 16: 1-52 (1983); Wetmer, Critical Reviews in Biochemistry and 
Molecular Biology 26: 227-259 (1991); Breslauer et al, Proc. Nad. Acad. Sci. 83: 
3746-3750 (1986); Wolf et al, Nucleic Acids Research, 15: 2911-2926 (1987); 

35 Innis et al, editors, PCR Protocols (Academic Press, New York, 1990); McGraw 
et al, Biotechniques, 8: 674-678 (1990), and the like. Stringency can be controlled. 
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by several varying several parameters, including temperature, salt concentration, 
concentration of certain organic solvents, such as formamide, and the like. 
Preferably, temperature is used to define the stringency classes because the activity 
of the various polymerases or ligases employed limits the degree to which salt 

5 concentration or organic solvent concentration can be varied for ensuring specific 
annealing of the oligonucleotide probes. 

Generally, the larger the stringency class the greater the complexity of the 
hybridizing mixture and the lower the concentration of any particular 
oligonucleotide probe in the mixture. A lower concentration of a oligonucleotide 

10 probe having a complementary site on a target polynucleotide reduces the relative 
likelihood of the oligonucleotide probe hybridizing and being ligated. This, in turn, 
leads to reduced sensitivity. Larger stringency classes also have a greater variance 
in the stabilities of the duplexes that form between a oligonucleotide probe and a 
complementary sequence. On the other hand, smaller stringency classes require a 

15 larger number of hybridization reactions to ensure that all oligonucleotide probes 
of a set are hybrized to a target polynucleotide. 

For example, when 8-mer oligonucleotide probes are employed stringency 
classes may include between about 50 to about 500 oligonucleotide probes each. 
Thus, several hundred to several thousand hybrization/ligation reactions are 

20 required. For larger sized oligonucleotide probes, much larger stringency classes 
are required to make the number of hybridization/extension reactions practical, e.g. 
10 4 -10 5 , or more. 

Oligonucleotide probes of the same stringency class can be synthesized 
simutaneously, in a manner similar to which fully random oligonucleotide probes 

25 are synthesized, e.g. as disclosed in Telenius et al, Genomics, 13: 718-725 (1992); 
Welsh et al, Nucleic Acids Research, 19: 5275-5279 (1991); Grothues et al, 
Nucleic Acids Research, 21: 1321-1322 (1993); Hartley, European patent 
application 90304496,4; and the like. The difference is that at each cycle different 
mixtures of monomers are applied to the growing oligonucleotide probe chain, 

30 wherein the proportion of each monomer in the mixture is dictated by the 

proportion of each nucleoside at the position of the oligonucleotide probe in the 
stringency class. Stringency classes are readily formed by computing the free 
energy of duplex formation by available algorithms, e.g. Breslauer et al, Proc. Nad. 
Acad. ScL, 83: 3746-3750 (1986); Lowe et al, Nucleic Acids Research, 18: 1757- 

35 1761 (1990);' or the like. The oligonucleotide probes can be ordered according to 
the free energy of binding to their complement under standard reaction conditions, . 
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e.g. with a standard bubble sort, Baase, Computer Algorithms (Addison-Wesley, 
Menlo Park, 1978). For example the following is the list of ten 6-mers with the 
greatest stability (from top to bottom) in terms of free energy of duplex formation 
under standard hybridization conditions and the least stability in terms of free 
5 energy of duplex formation (the free energies being computed via Breslauer (cited 
above)): 



20 







Ranking 


Sequence f5'-»3') 


l 


GCGCGC 


2 


CGCGCG 


3 


CCCGCG 


4 


CGCCCG 


5 


CGCGCC - 


6 


CGCGGC 


7 


CGGCGC 


8 


GCCGCG 


9 


GCGCCG 


10 


GCGCGG 

• 
• 


4087 


TCATAT 


4088 


TGATAT 


4089 


CATATA 


4090 


TATATG 


4091 


ATCATG 


4092 


ATGATG 


4093 


CATCAT 


4094 


CATGAT 


4095 


CATATG 


4096 


ATATAT 



35 Thus, if a stingency class consisted of the first ten 6-mers the mixture monomers for 
the first (3'-most) position would be 0:4:6:0 (A:C:G:T), for the second position it 
would be 0:6:4:0, and so on. If a stringency class consisted of the last ten 6-mers 
the mixture of monomers for the first position would be 1:0:4:5, for the second 
position it would be 5:0:0:5, and so on. The resulting mixtures may then be further 

40 enriched for sequences of the desired stingency class by thermal elution, e.g. 
Miyazawa et al, J. MoL BioL, 1 1: 223-237 ( 1965). 

More conveniendy, stringency classes containing several hundred to several 
thousands of oligonucleotides may be synthesized direcdy by a variety of parallel 
synthesis approaches, e.g. Frank et al, U.S. patent 4,689,405; Matson et al, Anal. 
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Biochem., 224: 1 10-1 16 (1995); Fodor et al, International application 
PCT/US93/04145; Pease et al, Proc. Natl. Acad. ScL, 91: 5022-5026 (1994); 
Southern et al, J. Biotechnology, 35: 217-227 (1994), Brennan, International 
application PCT/US94/05896; or the like. 
5 In some cases it may be desirable to form additional stringency classes of 

oligonucleotide probes by placing in a separate subset oligonucleotide probes 
having complementary sequences to other oligonucleotide probes in a subset or 
oligonucleotide probe that are susceptible of forming oligonucleotide probe-dimers. 
Clearly, one of ordinary skill in the art could combine features of the 

10 embodiments set forth above to design still further embodiments in accordance with 
the invention, but not expressly set forth above. 

The invention also includes systems and apparatus for carrying out method 
of the invention automatically. Such systems and apparatus can take a variety of 
forms depending on several design constraints, including i) the nature of the solid 

15 phase support used to anchor the target polynucleotide, ii) the degree of parallel 
operation desired, iii) the detection scheme employed; iv) whether reagents are re- 
used or discarded, and the like. Generally, the apparatus comprises a series of 
reagent reservoirs, one or more reaction vessels containing target polynucleotide, 
preferably attached to a solid phase support, e.g. magnetic beads, one or more 

20 detection stations, and a computer controlled means for transferring in a 
predetermined manner reagents from the reagent reservoirs to and from the 
reaction vessels and the detection. stations. The computer controlled means for 
transferring reagents and controlling temperature can be implemented by a variety 
of general purpose laboratory robots, such as that disclosed by Harrison et al, 

25 Biotechniques, 14: 88-97 (1993); Fujita et al, Biotechniques, 9: 584-591 (1990); 
Wada et al, Rev. Sci. Instrum., 54: 1569-1572 (1983); or the like. Such laboratory 
robots are also available commercially, e.g. Applied Biosystems model 800 
Catalyst (Foster City, CA). 

A variety of kits may be provided for carrying out different embodiments of the 

30 invention. Generally, kits of the invention include oligonucleotide probes, initializing 
oligonucleotides, and a detection system. Kits further include ligation reagents and 
instructions for practicing the particular embodiment of the invention. In embodiments 
employing protein ligases, RNase H, nucleic acid polymerases, or other enzymes, their 
respective buffers may be included. In some cases, these buffers may be identical. 

35 Preferably, kits also include a solid phase support, e.g. magnetic beads, for anchoring 

templates. In one preferred kit, fluorescently labeled oligonucleotide probes are provided 
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such that probes corresponding to different terminal nucleotides of the target 
polynucleotide carry distinct spectrally resolvable fluorescent dyes. As used herein, 
"spectrally resolvable" means that the dyes may be distinguished on basis of their spectral 
characteristics, particularly fluorescence emission wavelength, under conditions of 
5 operation. Thus, the identity of the one or more terminal nucleotides would be correlated 
to a distinct color, or perhaps ratio of intensities at different wavelengths. More 
preferably, four such probes are provided that allow a one-to-one corresponence between 
each of four spectrally resolvable fluorescent dyes and the four possible terminal 
nucleotides on a target polynucleotide. Sets of spectrally resolvable dyes are disclosed in 
10 U.S. patents 4,855,225 and 5,188,934; International application PCT/US90/05565; and 
Lee et al, Nucleic Acids Researchs, 20: 247 1-2483 (1992). 

Example 1 

Sequencing a Target Polynucleotide Amplified from pUC19 
15 with Fqw Imtiafojpg Oligonucleotides 

In this example, a template comprising a binding region and a portion of 
the pUC19 plasmid is amplified by PCR and attached to magnetic beads. Four 
initializing oligonucleotides are employed in separate reactions as indicated below. 
8-mer oligonucleotide probes are employed having 4 central ribonucleotides and 
20 both 5' and 3* monophosphates, as shown in the following formula: 

OP^OCOO-CSOBBRRRRBBCSO-OP^OXO-JO 

After annealing, probes are enzymatically ligated to the initializing oligonucleotides 
25 and the magnetic bead supports are washed. The 3' phosphates of the ligated 
probes are removed with phosphatase, after which the probes are extended with 
DNA polymerase in the presence of the four labeled dideoxynucleoside 
triphosphate chain terminators. After washing and identification of the extended 
nucleotide, the ligated probes are cleaved at the ribonucleotide moiety with RNAse 
30 H to remove the label and to regenerate an extendable end. 

The following double stranded fragment comprising a 36-mer binding 
region is ligated into a Sac I/Xma I-digested pUC19: 

CCTCTCCCTTCCCTCTCCTCCCTCTCCCCTCTCCCTC 
35 TCGAGGAGAGGGAAGGGAGAGGAGGGAGAGGGGAGAGGGAGGGCC 



r 
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After isolation and amplification, a 402 basepair fragment of the modified pUCl9 is 
amplified by PCR for use as a template. The fragment spans a region of pUC19 
from position 41 to the binding region inserted adjacent to the Sac I site in the 
polylinker region (position 413 of the unmodified pUC19), Yanisch-Perron et al, 
Gene, 33: 103-1 19 (1985). Two 18-mer oligonucleotide probes are employed 
having sequences S'-CCCTCTCCCCTCTCCCTCx-S' and 5- 
GCAGCTCCCGGAGACGGT-3\ where V is a 3' biotin moiety is attached during 
synthesis using a commercially available reagent with manufacturer's protocol, e.g. 3 r 
Biotin-ON CPG (Clontech Laboratories, Palo Alto, California). The amplified 
template is isolated and attached to streptavidin-coated magnetic beads (Dynabeads) 
using manufacturer's protocol, Dynabeads Template Preparation Kit, with M280- 
streptavidin (Dynal, Inc., Great Neck, New York). A sufficient quantity of the 
biotinylated 313 basepair fragment is provided to load about 300 jig of Dynabeads 
M280-Streptavidin. 

The binding region sequence is chosen so that the duplexes formed with the 
initiating oligonucleotides have compositions of about 66% GC to enhance duplex 
stability. The sequence is also chosen to prevent secondary structure formation 
and fortuitous hybridization of an initializing oligonucleotide to more than one 
location within the binding region. Any shifting of position of a given initializing 
oligonucleotide within the binding region results in a significant number of mis- 
matched bases. 

After loading, the non-biotinylated strand of template is removed by heat 
denaturation, after which the magnetic beads are washed and separated into four 
aliquots. The template attached to the magnetic beads has the following 
sequence: 

(Magnetic bead)KlinkerH30-CTCCCTCTCCCCTCTCCCTCCTC^ 

TCCCTTCCTCTCCTCGAGCTTAAGT ... CTCGACG-(5') 

The following four oligonucleotides are employed as initializing oligonucleotides 
in each of the separate aliquots of template: 

5-GAGGAGAGGGAAGGAGAGGAG 
5'-GGAGGAGAGGGAAGGAGAGGA 
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5'-GGGAGGAGAGGGAAGGAGAGG 
5*-AGGGAGGAGAGGGAAGGAGAG 

5 Reactions and washes below are generally carried out in 50 volumes of 
manufacturer's (New England Biolabs') recommended buffers for the 
enzymes employed, unless otherwise indicated. Standard buffers are also 
described in Sambrook et al, Molecular Cloning, 2nd Edition (Cold Spring 
Harbor Laboratory Press, 1989). 

10 96 stingency classes of 684 or 682 oligonucleotide probes each (2 subsets for each 

of 48 different annealing temperatures) are formed which together contain all 8-mer 
probes for each of the four aliquots. The probes of each of the 96 classes are separately 
annealed to the target polynucleotide in reaction mixtures having the same components, 
with the exception that extensions and ligations carried out with Sequenase and T4 DNA 

15 ligase at temperatures less than 37°C and extensions and ligations carried out with Taq 
Stoffel fragment and a thermostable ligase otherwise. 

The 48 stringency conditions are defined by annealing temperatures which range 
from 22°C to 70°C, such that each grouping of subsets at the same temperature differ in 
annealing temperature by 1°C from that of the subset groupings containing the next 

20 highest and next lowest stringency classes. The range of annealing temperatures (22- 
70°C) is roughly bounded by the temperatures 5-10 degrees below the temperatures at 
which the least stable and most stable 8-mers, respectively, are expected to have about 
fifty percent maximum annealing in a standard PCR buffer solution. 

After 5-10 minutes incubation at 80°C, the reaction mixtures are brought down to 

25 their respective annealing temperatures over a period of 20-30 minutes. After ligation, 
washing and treatment with phosphatase, 2 units of polymerase and labeled 
dideoxynucleotide triphosphates (.08 mM final reaction concentration and labeled with 
TAMRA (tetramethylrhodamine), FAM (fluorescein), ROX (rhodamine X), and 
JOE (2 , ,7*-dimethoxy-4 , ,5'-dichlorofluorescein)) are added. After 15 minutes, the 

30 beads are washed with H2O and the identity of the extended nucleotide is determined by 
illuminating each reaction mixture with standard wavelengths, e.g Users Manual, model 
373 DNA Sequencer (Applied Biosystems, Foster City, CA). 

After identification, the reaction mixtures are treated with RNase H 
using the manufacturer's suggested protocol and washed. The RNase H 

35 treated extended duplexes have regenerated 3' hydroxyls and are ready for 
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the next cycle of ligation/extension/cleavage. The cycles are carried out 
until all the nucleotides of the test sequence are identified. 

Example 2 

5 Sequencing a Target Poly nucleotide Amplified from pUC19 

with One Initializing Oligonucleotide 
In this example, a template is prepared in accordance with Example 1, 
except that since extension is in the 5'-*y direction in this example, the biotin 
moiety is attached to the 5 ! end of the primer hybridizing to the CT-rich strand of 
10 the binding region. Thus, in this example, the binding region of the single stranded 
template will be a G A-rich segment (essentially the complement of the binding 
region of Example 1). Two 18-mer oligonucleotide probes are employed having 
sequences 5 r ~xGAGGGAGAGGGGAGAGGG-3' and 5'- 

ACCGTCTCCGGGAGCTGC-3\ where "x" is a 5' biotin moiety is attached during 
15 synthesis using commercially available reagents with manufacturers' protocols, e.g. 

the Aminolink aminoalkylphosphoramidite linking agent (Applied Biosystems, 

Foster City, California) and Biotin-X-NHS Ester available form Clontech 

Laboratories (Palo Alto, California). 

A single 21-mer initializing oligonucleotide is employed with the following 
20 sequence: 

5 t -OP(=0)(0-)0-CC^C^CCCT^CCCTCTCCTCC-3 , 

6-mer oligonucleotide probes are employed that have an acid labile 
25 phosphoramidate linkage between the 3-most nucleoside and 3 -penultimate 
nucleoside of the probe, as shown in the following formula: 

HO-(3 f )B(5 f )-OP(=0)(0-)NH-(3')BBBBB t * 

30 where B t * is a JOE-, FAM-, TAMRA-, or ROX-labeled dideoxynucleoside, such 
that the label corresponds to the identity of the 3-most nucleotide (so 16 different 
labeled dideoxynucleoSdes are used in the synthesis of the probes). 

As above, the 6-mer probes are prepared in 96 stringency classes of 
42 or 43 probes each (2 subsets for each of 48 different annealing 

35 temperatures). Hybridizations and ligations are carried out as described 
above. After ligation and washing, a nucleoside in the target polynucleotide 
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is identified by the fluorescent signal of the oligonucleotide probe. Acid 
cleavage is then carried out by treating the extended duplex with 0.8% 
trifluoroacetic acid in dichloromethane for 40 minutes at room temperature to 
regenerate an extendable end on the extended duplex. The process continues 
until the sequence of the target polynucleotide is determined. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(ii) TITLE OF INVENTION: DNA Sequencing by 
Stepwise Extension with Oligonucleotide Blocks 

(iii) NUMBER OF SEQUENCES: 8 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Stephen C. Macevicz ■ 

(B) STREET: 21890 Rucker Drive 

(C) CITY: Cupertino 

(D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 95014 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: 3.5 inch diskette 

(B) COMPUTER : IBM compatible „ c n 

(C) OPERATING SYSTEM: Windows 3.1 /DOS 5.0 

(D) SOFTWARE: Microsoft Word for Windows, 
vers . 2.0 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Stephen C. Macevicz 

(B) REGISTRATION NUMBER: 30,285 

(C) REFERENCE / DOCKET NUMBER: peol 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (415) 638-5552 

(B) TELEFAX: 



(2) INFORMATION FOR SEQ ID NO : 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 
CCTCTCCCTT CCCTCTCCTC CCTCTCCCCT CTCCCTC 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 



GAGGAGAGGG AAGGAGAGGA G 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GGAGGAGAGG GAAGGAGAGG A 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GGGAGGAGAG GGAAGGAGAG G 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
AGGGAGGAGA GGGAAGGAGA G 



(2) INFORMATION FOR SEQ ID NO: 6: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
GAGGGAGAGG GGAGAGGG 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
ACCGTCTCCG GGAGCTGC 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
CCTCTCCCTT CCCTCTCCTC C 



-27- 



c 



I claim: 
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1. A method for identifying a sequence of nucleotides in a 
5 polynucleotide, the method comprising the steps of: 

(a) extending an initializing oligonucleotide along the polynucleotide 
by ligating an oligonucleotide probe thereto to forni an extended duplex; 

(b) identifying one or more nucleotides of the polynucleotide; and 

(c) repeating steps (a) and (b) until the sequence of nucleotides is 
10 determined. 

2. The method of claim 1 wherein said oligonucleotide probe has a 
chain-terminating moiety at a terminus distal to said initializing 
oligonucleotide. 

15 

3. The method of claim 2 wherein said step.of identifying includes 
removing said chain-terminating moiety and extending said oligonucleotide 
probe with a nucleic acid polymerase in the presense of one or more labeled 
chain-teiminating nucleoside triphosphates. 

20 

4. The method of claim 3 futher including a step of regenerating an 
extendable teiminus on said extended duplex. 

5. The method of claim 4 wherein said oligonucleotide probe includes a 
25 subsequence of four ribonucleotides and wherein said step of regenerating 

includes cleaving said oligonucleotide probe with RNase H. 

6. The method of claim 5 wherein said chain-terminating moiety is a 3' 
phosphate. 

30 

7. The method of claim 2 further including a step of capping an 
extended duplex^or; said initializing oligonucleotide whenever the extended 
duplex or said initializing oligonucleotide fails to ligate to said 
oligonucleotide probe. 

35 
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8. The method of claim 2 futher including a step of regenerating an ^ 
extendable terminus on said extended duplex. 



9. The method of claim 8 wherein said step of regenerating includes 
5 cleaving a chemically scissile internucleosidic linkage in said extended 

duplex. 

10. The method of claim 9 wherein said chemically scissile 
internucleosidic linkage is a phosphoramidate. 

10 

1 1 . The method of claim 8 wherein said step of regenerating includes 
enzymatically cleaving an internucleosidic linkage in said extended duplex, 

12. The method of claim 1 1 wherein said oligonucleotide probe includes 
15 a subsequence of four ribonucleotides and wherein said step of regenerating 

includes cleaving said oligonucleotide probe with RNase H. 

13. A method for determining the nucleotide sequence of a 
polynucleotide, the method comprising the steps of: 

20 (a) providing a template comprising the polynucleotide; 

(b) providing an initializing oligonucleotide which forms a duplex 
with the template adjacent to the polynucleotide; 

(c) annealing an oligonucleotide probe to the template adjacent to the 
initializing oligonucleotide; 

25 (d) ligating the oligonucleotide probe to the initializing 

oligonucleotide to form an extended duplex; 

(e) identifying one or more nucleotides of the polynucleotide by a 
label on the ligated oligonucleotide probe; and 

(f) repeating steps (c) through (e) until the nucleotide sequence of 
30 the polynucleotide is determined. 

14. The method of claim 13 wherein said oligonucleotide probe has a 
chain-terminating moiety at a terminus distal to said initializing 
oligonucleotide and wherein said method further includes a step of 

35 regenerating an extendable terminus on said oligonucleotide probe. 
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. 15. The method of claim 14 further including a step of capping said 
extended duplex or said initializing oligonucleotide that fails to ligate to 
said oligonucleotide probe. 

5 16. The method of claim 14 wherein said step of identifying consists of 
identifying a single nucleotide of said polynucleotide. 

17. The method of claim 16 wherein said step of identifying includes 
removing said chain-tenninating moiety and extending said oligonucleotide 

10 probe with a nucleic acid polymerase in the presense of one or more labeled 
chain-tenninating nucleoside triphosphates. 

18. An oligonucleotide probe of the formula: 



15 



Ha(30CB)j(5>QP(^XO->NH<B> kr B t * 



wherein: 



20 



B is a nucleotide or an analog thereof; 
j is in the range of from 1 to 12; 

k is in the range of from 0 to 12, such that the sum of j and k is less 



than or equal to 12; 

B t * is a labeled chain-terminating moiety. 



25 



19. An oligonucleotide probe selected from the group consisting of: 



OPC^XO-JO-CSOCB^RRRRCB^Bt* 



30 



HO-(3')(B) s RRRR(B) w B t * 



and 



35 



OP(=0)(0-)0-(5 , )(B) s RRRR(B) w (3 , )OP(=0)(0")0 



wherein: 



B is a deoxyribonucieotide or an analog thereof; 



-30- 



( 



c 



R is a ribonucleotide; 

s is in the range of from 1 to 8; 

w is in the range of from 0 to 8, such that the sum of j and k is less 
than or equal to 8; 
5 B t * is a labeled chain-terminating moiety. 
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